Semiparametric Principal Component Analysis

نویسندگان

  • Fang Han
  • Han Liu
چکیده

We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nonlinear Multidimensional Data Projection and Visualisation

Multidimensional data projection and visualisation are becoming increasingly important and have found wide applications in many fields such as decision support, bioinformatics and web/document organisation. Various methods and algorithms have been proposed as either nonparametric or semiparametric approaches. This paper provides an overview of the subject and reviews some recent developments. R...

متن کامل

Principal Component Analysis on non-Gaussian Dependent Data

In this paper, we analyze the performance of a semiparametric principal component analysis named Copula Component Analysis (COCA) (Han & Liu, 2012) when the data are dependent. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. We study the scenario where the observations are drawn from non-i.i.d. processes ...

متن کامل

Semiparametric principal component poisson regression on clustered data

In modelling count data with multivariate predictors, we often encounter problems with clustering of observations and interdependency of predictors. We propose to use principal components of predictors to mitigate the multicollinearity problem and to abate information losses due to dimension reduction, a semiparametric link between the count dependent variable and the principal components is po...

متن کامل

Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model

In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. The major contributions are two folds. First, in low dimensions and under the Gaussian model, by borrowing the strength from recent development in minimax optimal principal component estimation, we first time sharply characterize the potential advantage of classical principal co...

متن کامل

On Estimating the Mixed Effects Model

This paper introduces a new estimation method for time-varying individual effects in a panel data model. An important application is the estimation of time-varying technical inefficiencies of individual firms using the fixed effects model. Most models of the stochastic frontier production function require rather strong assumptions about the distribution of technical inefficiency (e.g., half-nor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012